Equitability, interval estimation, and statistical power
نویسندگان
چکیده
As data sets grow in dimensionality, non-parametric measures of dependence have seen increasing use in data exploration due to their ability to identify non-trivial relationships of all kinds. One common use of these tools is to test a null hypothesis of statistical independence on all variable pairs in a data set. However, because this approach attempts to identify any non-trivial relationship no matter how weak, it is prone to identifying so many relationships — even after correction for multiple hypothesis testing — that meaningful follow-up of each one is impossible. What is needed is a way of identifying a smaller set of “strongest” relationships of all kinds that merit detailed further analysis. Here we formally present and characterize equitability, a property of measures of dependence that aims to overcome this challenge. Notionally, an equitable statistic is a statistic that, given some measure of noise, assigns similar scores to equally noisy relationships of different types (e.g., linear, exponential, etc.) [1]. We begin by formalizing this idea via a new object called the interpretable interval, which functions as an interval estimate of the amount of noise in a relationship of unknown type. We define an equitable statistic as one with small interpretable intervals. We then draw on the equivalence of interval estimation and hypothesis testing to show that under moderate assumptions an equitable statistic is one that yields well powered tests for distinguishing not only between trivial and non-trivial relationships of all kinds but also between non-trivial relationships of different strengths, regardless of relationship type. This means that equitability allows us to specify a threshold relationship strength x0 below which we are uninterested, and to search a data set for relationships of all kinds with strength greater than x0. Thus, equitability can be thought of as a strengthening of power against independence that enables fruitful analysis of data sets with a small number of strong, interesting relationships and a large number of weaker, less interesting ones. We conclude with a demonstration of how our two equivalent characterizations of equitability can be used to evaluate the equitability of a statistic in practice.
منابع مشابه
Theoretical Foundations of Equitability and the Maximal Information Coefficient
The maximal information coefficient (MIC) is a tool for finding the strongest pairwise relationships in a data set with many variables [1]. MIC is useful because it gives similar scores to equally noisy relationships of different types. This property, called equitability, is important for analyzing high-dimensional data sets. Here we formalize the theory behind both equitability and MIC in the ...
متن کاملEquitability, mutual information, and the maximal information coefficient.
How should one quantify the strength of association between two random variables without bias for relationships of a specific form? Despite its conceptual simplicity, this notion of statistical "equitability" has yet to receive a definitive mathematical formalization. Here we argue that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequa...
متن کاملAn Empirical Study of Leading Measures of Dependence
In exploratory data analysis, we are often interested in identifying promising pairwise associations for further analysis while filtering out weaker, less interesting ones. This can be accomplished by computing a measure of dependence on all possible variable pairs and examining the highest-scoring pairs, provided the measure of dependence used assigns similar scores to equally noisy relationsh...
متن کاملAn Empirical Study of the Maximal and Total Information Coefficients and Leading Measures of Dependence
In exploratory data analysis, we are often interested in identifying promising pairwise associations for further analysis while filtering out weaker ones. This can be accomplished by computing a measure of dependence on all variable pairs and examining the highest-scoring pairs, provided the measure of dependence used assigns similar scores to equally noisy relationships of different types. Thi...
متن کاملCleaning up the record on the maximal information coefficient and equitability.
Although we appreciate Kinney and Atwal’s interest in equitability and maximal information coefficient (MIC), we believe they misrepresent our work. We highlight a few of our main objections below. Regarding our original paper (1), Kinney and Atwal (2) state “MIC is said to satisfy not just the heuristic notion of equitability, but also the mathematical criterion of R equitability,” the latter ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1505.02212 شماره
صفحات -
تاریخ انتشار 2015